| ti | tl | e: “CSCI E - 63C Week 13 Problem Set” |
| au | th | or: “Ethan Mallove” |
| da | te | : “November 27, 2019” |
| ou | tp | ut: |
| ht | ml_document: | |
| toc: true | ||
| number_sections: true |
The goal of this problem set is to develop some intuition about the impact of the number of nodes in the hidden layer of the neural network. We will use few simulated examples to have clear understanding of the structure of the data we are modeling and will assess how performance of the neural network model is impacted by the structure in the data and the setup of the network.
First of all, to compensate for lack of coverage on this topic in ISLR, let’s go over a couple of simple examples. We start with simulating a simple two class dataset in 2D predictor space with an outcome representative of an interaction between attributes. (Please notice that for the problems you will be working on this week you will be asked below to simulate a dataset using a different model.)
# fix seed so that narrative always matches the plots:
set.seed(1234567890)
nObs <- 1000
ctrPos <- 2
xyTmp <- matrix(rnorm(4*nObs), ncol = 2)
xyCtrsTmp <- matrix(sample(c( - 1, 1)*ctrPos, nObs*4, replace = TRUE), ncol = 2)
xyTmp <- xyTmp + xyCtrsTmp
gTmp <- paste0("class", (1 + sign(apply(xyCtrsTmp, 1, prod)))/2)
plot(xyTmp, col = as.numeric(factor(gTmp)), pch = as.numeric(factor(gTmp)), xlab = "X1", ylab = "X2")
abline(h = 0)
abline(v = 0)
Symbol color and shape indicate class. Typical problem that will present a problem for any approach estimating a single linear decision boundary. We used similar simulated data for the random forest (week 10) problem set.
Simulate data with n = 1000 observations and p = 3 covariates – all random variables from uniform distribution contained on \([ - 1, 1]\) interval: \(X_i\sim \mathcal{U}( - 1, 1), i = \{1, 2, 3\}\). Resulting observations should be uniformly filling in a 3D cube centered at \((0, 0, 0)\) with the edge of length \(2\). Create two category class variable assigning all observations within a smaller cube – also centered at \((0, 0, 0)\) with the edge length of \(2^\frac{2}{3}\) – to one class category and observations outside of this cube – to the second class category.
Please note that this dataset is entirely different from the one used in the preface – you will need to write code simulating it on your own. Somewhat related 2D dataset was used as a motivational example at week 12 (SVM) lecture before introducing kernels and SVMs. However, the example in the lecture was in 2D (three - dimensional problem here), had spheric boundary (here we work with nested cubes) and used normal (here uniform) distribution. Since you will be reusing this code in the following two problems it is probably best to turn this procedure into a function with appropriate parameters.
Ten points available for this problem are composed of accomplishing the following tasks:
- Correct implementation of the function generating data as prescribed above (2 points)
- Check and demonstrate numerically that the resulting class assignment splits these observations, subject to sampling variability, evenly between these two groups (2 points)
- Explain why this is achieved by using \(2^\frac{2}{3}\) as the length of the inner cube edge that defines true decision boundary (2 points)
- Plot values of the resulting covariates projected at each pair of the axes indicating classes to which observations belong with symbol color and/or shape (you can use function
pairs, for example) (2 points)- Reflect on the geometry of this problem by answering the following question: what is the smallest number of planes in 3D space that would completely enclose points from the “inner” class? Is this number equal to the number of cube faces or is it something smaller? Larger? Please note that “enclose” above does not mean “perfectly discriminate between the points assigned to two classes” (2 points)
For the dataset simulated above fit neural networks with 1 through 6 nodes in a single hidden layer (use
neuralnetimplementation). For each of them calculate training error (see an example in Preface where it was calculated usingerr.fctfield in the result returned byneuralnet). Simulate another independent dataset (with n = 10, 000 observations to make resulting test error estimates less variable) using the same procedure as above (3D, two classes as nested cubes) and use it to calculate the test error at each number of hidden nodes. Plot training and test errors as function of the number of nodes in the hidden layer. What does resulting plot tells you about the interplay between model error, model complexity and problem geometry? What is the geometrical interpretation of this error behavior?
Setup a simulation repeating procedure described above for n = 100, 200 and 500 observations in the training set as well as adding none, 1, 2 and 5 null variables to the training and test data (and to the covariates in formula provided to
neuralnet). Draw values for null variables from uniform distribution on \([ - 1, 1]\) as well and do not use them in the assignment of the observations to the class category (e.g.x< - matrix(runif(600, -1, 1), ncol = 6); cl< - as.numeric(factor(rowSums(abs(x[, 1:3])<2^( - 1/3)) = =3))creates dataset with three informative and three null variables). Repeat calculation of training and test errors at least several times for each combination of sample size, number of null variables and size of the hidden layer simulating new training and test dataset every time to assess variability in those estimates. Present resulting error rates so that the effects of sample size and fraction of null variables can be discerned and discuss their impact of the resulting model fits.
Use
neuralnetto model the outcome in WiFi localization dataset that we used in previous weeks using outcome in its original, four - levels format. Yourneuralnetmodels will need to have four output nodes to represent outcome with four levels. Obtain training and test error for this model for several sizes of the hidden layer. Compare resulting test error in predicting Location 3 to that observed for random forest, SVM and KNN approaches earlier in the course.
For reproducibility purposes it is always a good idea to capture the state of the environment that was used to generate the results:
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.14.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.2.1 neuralnet_1.44.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 knitr_1.24 magrittr_1.5 tidyselect_0.2.5
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.0 rlang_0.4.1
## [9] stringr_1.4.0 dplyr_0.8.3 tools_3.5.1 gtable_0.3.0
## [13] xfun_0.9 withr_2.1.2 htmltools_0.4.0 lazyeval_0.2.2
## [17] digest_0.6.22 assertthat_0.2.1 tibble_2.1.3 crayon_1.3.4
## [21] purrr_0.3.2 glue_1.3.1 evaluate_0.14 rmarkdown_1.15
## [25] stringi_1.4.3 compiler_3.5.1 pillar_1.4.2 scales_1.0.0
## [29] pkgconfig_2.0.3
The time it took to knit this file from beginning to end is about (seconds):
proc.time() - ptStart
## user system elapsed
## 29.232 2.546 32.001